7 research outputs found

    Finding the global semantic representation in GAN through Frechet Mean

    Full text link
    The ideally disentangled latent space in GAN involves the global representation of latent space with semantic attribute coordinates. In other words, considering that this disentangled latent space is a vector space, there exists the global semantic basis where each basis component describes one attribute of generated images. In this paper, we propose an unsupervised method for finding this global semantic basis in the intermediate latent space in GANs. This semantic basis represents sample-independent meaningful perturbations that change the same semantic attribute of an image on the entire latent space. The proposed global basis, called Fr\'echet basis, is derived by introducing Fr\'echet mean to the local semantic perturbations in a latent space. Fr\'echet basis is discovered in two stages. First, the global semantic subspace is discovered by the Fr\'echet mean in the Grassmannian manifold of the local semantic subspaces. Second, Fr\'echet basis is found by optimizing a basis of the semantic subspace via the Fr\'echet mean in the Special Orthogonal Group. Experimental results demonstrate that Fr\'echet basis provides better semantic factorization and robustness compared to the previous methods. Moreover, we suggest the basis refinement scheme for the previous methods. The quantitative experiments show that the refined basis achieves better semantic factorization while constrained on the same semantic subspace given by the previous method.Comment: 25 pages, 21 figure

    Analyzing the Latent Space of GAN through Local Dimension Estimation

    Full text link
    The impressive success of style-based GANs (StyleGANs) in high-fidelity image synthesis has motivated research to understand the semantic properties of their latent spaces. In this paper, we approach this problem through a geometric analysis of latent spaces as a manifold. In particular, we propose a local dimension estimation algorithm for arbitrary intermediate layers in a pre-trained GAN model. The estimated local dimension is interpreted as the number of possible semantic variations from this latent variable. Moreover, this intrinsic dimension estimation enables unsupervised evaluation of disentanglement for a latent space. Our proposed metric, called Distortion, measures an inconsistency of intrinsic tangent space on the learned latent space. Distortion is purely geometric and does not require any additional attribute information. Nevertheless, Distortion shows a high correlation with the global-basis-compatibility and supervised disentanglement score. Our work is the first step towards selecting the most disentangled latent space among various latent spaces in a GAN without attribute labels

    Minimal Width for Universal Property of Deep RNN

    Full text link
    A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or LpL^p function with the widths dx+dy+2d_x+d_y+2 and max{dx+1,dy}\max\{d_x+1,d_y\}, respectively, where the target function maps a finite sequence of vectors in Rdx\mathbb{R}^{d_x} to a finite sequence of vectors in Rdy\mathbb{R}^{d_y}. We also compute the additional width required if the activation function is tanh\tanh or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs

    Do Not Escape From the Manifold: Discovering the Local Coordinates on the Latent Space of GANs

    Full text link
    The discovery of the disentanglement properties of the latent space in GANs motivated a lot of research to find the semantically meaningful directions on it. In this paper, we suggest that the disentanglement property is closely related to the geometry of the latent space. In this regard, we propose an unsupervised method for finding the semantic-factorizing directions on the intermediate latent space of GANs based on the local geometry. Intuitively, our proposed method, called Local Basis, finds the principal variation of the latent space in the neighborhood of the base latent variable. Experimental results show that the local principal variation corresponds to the semantic factorization and traversing along it provides strong robustness to image traversal. Moreover, we suggest an explanation for the limited success in finding the global traversal directions in the latent space, especially W-space of StyleGAN2. We show that W-space is warped globally by comparing the local geometry, discovered from Local Basis, through the metric on Grassmannian Manifold. The global warpage implies that the latent space is not well-aligned globally and therefore the global traversal directions are bound to show limited success on it.Comment: 23 pages, 19 figure

    Disentangling the correlated continuous and discrete generative factors of data

    No full text
    Real-world data typically include discrete generative factors, such as category labels and the existence of objects, as well as continuous generative factors. Continuous generative factors may be dependent on or independent of discrete generative factors. For instance, an intra-class variation of a category is dependent on the discrete generative factor, whereas a common variation of all categories is not. Most previous attempts to integrate discrete generative factors into disentanglement assumed statistical independence between the continuous and discrete variables. In this paper, we propose a Variational Autoencoder(VAE) model capable of disentangling both continuous generative factors. To represent these generative factors, we introduce two sets of continuous latent variables: a private variable and a public variable . The private and public variables represent the intra-class variations and common variations in categories, respectively. Our proposed framework models the private variable as a Gaussian mixture and the public variable as a Gaussian. Each mode of the private variable is responsible for a class of discrete variables. Our proposed model, called Discond-VAE, DISentangles the class-dependent CONtinuous factors from the Discrete factors by introducing private variables. The experiments showed that Discond-VAE could discover private and public factors from the data. Moreover, even under the dataset with only public factors, Discond-VAE does not fail and adapts private variables to represent public factors.(c) 2022 Elsevier Ltd. All rights reserved.N
    corecore